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Clostridium indolis DSM 755^ is a bacterium commonly found in soils and the feces of birds 
and mammals. Despite its prevalence, little is known about the ecology or physiology of this 
species. However, close relatives, C. saccharolyticum and C. hathewayi, have demonstrated 
interesting metabolic potentials related to plant degradation and human health. The genome 
of C. indolis DSM 755^ reveals an abundance of genes in functional groups associated with 
the transport and utilization of carbohydrates, as well as citrate, lactate, and aromatics. Eco- 
logically relevant gene clusters related to nitrogen fixation and a unique type of bacterial 
microcompartment, the CoAT BMC, are also detected. Our genome analysis suggests hy- 
potheses to be tested in future culture based work to better understand the physiology of this 
poorly described species. 



Abbreviations: DSM- German Collection of Microorganisms and Cell Cultures 
(Braunschweig, Germany), ATCC: American Type Culture Collection (Manassas, VA, USA), 
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Introduction 



The C. saccharolyticum species group is a poorly 
described and taxonomically confusing clade in 
the Lachnospiraceae, a family within the 
Closthdiales that includes members of clostridial 
cluster XlVa [1]. This group includes C. indolis, C. 
sphenoides, C. methoxybenzovorans, C. 
celerecrescens, and Desulfotomaculum guttoideum, 
none of which are well studied [Figure 1). C. 
saccharolyticum has gained attention because its 
saccharolytic capacity was shown to be syntrophic 
with the cellulolytic activity of Bacteroides 
cellulosolvens in co-culture, enabling the conver- 



sion of cellulose to ethanol in a single step [6,7]. 
Members of this group, such as C. celerecrescens, 
are themselves cellulolytic [8], and others are 
known to degrade unusual substrates such as 
methylated aromatic compounds [C. 
methoxybenzovorans} [9], and the insecticide 
lindane [C. sphenoides} [10]. C. indolis was target- 
ed for whole genome sequencing to provide in- 
sight into the genetic potential of this taxa that 
could then direct experimental efforts to under- 
stand its physiology and ecology. 
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Classification and features 

The general features of Clostridium indolis DSM 
755T are listed in Table 1. C. indolis DSM 755t was 
originally named for its ability to hydrolyze tryp- 
tophan to indole, pyruvate, and ammonia [23] in 
the classic Indole Test used to distinguish bacteri- 
al species. It has been isolated from soil [24], feces 
[25], and chnical samples from infections [27]. De- 
spite its prevalence, C. indolis is not well charac- 
terized, and there are conflicting reports about its 
physiology. It is described as a sulfate reducer 
with the ability to ferment some simple sugars. 



pectin, pectate, mannitol, and galacturonate, and 
convert pyruvate to acetate, formate, ethanol, and 
butyrate [28]. According to this source, neither 
lactate nor citrate are utilized, however other 
studies demonstrate that fecal isolates closely re- 
lated to C. indolis may utilize lactate [29], and that 
the type strain DSM 755t utilizes citrate [30]. It is 
unclear whether C. indolis is able to make use of a 
wider range of sugars or break down complex 
carbohydrates, however growth is reported to be 
stimulated by fermentable carbohydrates [28]. 
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Figure 1. Phylogenetic tree based on 16S rRNA gene sequences highlighting the position of Clostridium indolis relative 
to other type strains (T) within the Lachnospiraceae. The strains and their corresponding NCBI accession numbers (and, 
when applicable, draft sequence coordinates) for 16S rRNA genes are: Desulfotomaculum guttoideum strain DSM 
4024^ Y11568; C. sphenoides ATCC 19403^ AB075772; C celerecrescens DSM 5628\ X71848; C. indolis DSM 
755^ Pending release by JGI: 1620643-1622056; C methoxybenzovorans SR3, AF067965; C. saccharolyticum WM1\ 
NC_01 4376:1 8567-20085; C. algidixylanolyticum SPL73^ AF092549; C hathewayi DSM 13479^^, ADLNOOOOOOOO: 
202-1639; Eubacterium eligens L34420 \ L34420; Ruminococcus gnavus ATCC 291 49\ X94967; R. torques ATCC 
27756\ L76604; E. rectale 134627^; Roseburia intestinalis U-82\ AJ312385; R. hominis A2-183\ AJ270482; C. 
jejuense HY-35-12'^, AY494606; C. xylanovorans HESPr, AF1 16920; C. phytofermentans ISDg'^, CP000885: 15754- 
1 7276. The tree uses sequences aligned by MUSCLE, and was inferred using the Neighbor-Joining method [2]. The op- 
timal tree with the sum of branch lengths = 0.50791241 is shown. The percentage of replicate trees in which the asso- 
ciated taxa clustered together in the bootstrap test (500 replicates) are shown next to the branches [3]. The tree is 
drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phyloge- 
netic tree. The evolutionary distances were computed using the Maximum Composite Likelihood method [4] and are 
in the units of the number of base substitutions per site. Evolutionary analyses were conducted in MEGA 5 [5]. C. 
stercorarium ATCC 35414^ CP003992: 856992-85851 3 was used as an outgroup. 
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Table 1. Classification and general features of Clostridium indolis DSM 755^ 



/VIHj3 lU 


Property 


Term 


Evidence Code 






Domain Bacteria 


TAS 


n 1 1 






Phylum Firmicutes 


1 r\o 








Class Clostridia 


1 r\J 






Current riassifira- 










tion 


Order Clostridiales 


TAS 


[17,18] 






Family Lachnospiraceae 


TAS 


[15,19] 






Genus Clostridium 


TAS 


[17,20,21] 






Species Clostridium indolis 


TAS 


[17,22] 






Type strain DSM 755 








Gram stain 


Negative 


TAS 


[23,24] 




'^^cll DlldJJt; 


RnrI 

ixUU 


TAS 


[23,24] 




Motility 


Motile 


TAS 


[23,24] 




Sporulatlon 


Terminal, spherical spores 


TAS 


[23,24] 






\^ocr^nri 1 1 1/" 
JVIcjUUI 1 1 1 1 l. 


TAS 


[23,24] 




Optimum temper- 










ature 


37°C 

Glucose, lactose, sucrose. 


TAS 


[23,24[ 




Carbon sources 


mannitol, pectin, pyruvate, others 


TAt; 
1 r\j 






Tprmin^I plprfron 

Iv^lllllllCll v^lvUv-LI Wl 1 










receptor 


Sulfate 


TAS 


[23,24] 




1 nuoie rest 


rOSILlve 


TAS 


[23,24] 


MIGb-6 


Habitat 


Isolated from soil, feces, wounds 


TAS 


[24,25[ 


MIGS-6.3 


Salinity 


Inhibited by 6.5% NaCI 


TAS 


[23,24[ 


MIGS-22 


Oxygen 


Anaerobic 

Free living and host associated 


TAS 


[23,24[ 


MIGS-15 


Biotic relationship 


TAS [24,25], 9 






MIGS-14 


Pathogenicity 
Geographic loca- 


No NAS 






MIGS-4 


tion 


Soil, feces 


TAS 


[24,25] 



Evidence codes - IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct re- 
port exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the 
living, isolated sample, but based on a generally accepted property for the species, or anecdotal evi- 
dence). These evidence codes are from the Gene Ontology project [26[. 



Genome sequencing information 

Genome project history 

The genome was selected based on the related- 
ness of C. indolis DSM 755t to C. saccharolyticum, 
an organism with interesting saccharolytic and 
syntrophic properties. The genome sequence was 
completed on May 2, 2013, and presented for pub- 
he access on June 3, 2013. Quality assurance and 
annotation done by DOE Joint Genome Institute 
QGI) as described below. Table 2 presents a sum- 
mary of the project information and its association 
with MIGS version 2.0 compliance [31]. 

Table 2. Project information 



Growth conditions and DNA isolation 

C. indolis DSM 755^ was cultivated anaerobically 
on GS2 medium as described elsewhere [32]. DNA 
for sequencing was extracted using the DNA Isola- 
tion Bacterial Protocol available through the JGI 
r http://www.jgi.doe.gov ). The quality of DNA ex- 
tracted was assessed by gel electrophoresis and 
NanoDrop (ThermoScientific, Wilmington, DE) ac- 
cording to the JGI recommendations, and the 
quantity was measured using the Quant-iTTw 
Picogreen assay kit [Invitrogen, Carlsbad, CA) as 
directed. 
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MIGS ID 


Property 


Term 


MIGS-31 


Finishing quality 


Improved Draft 






Shotgun and long insert mate pair 


MIGS-28 


Libraries used 


(lllumina), SMRTbell™ (PacBio) 


MIGS-29 


Sequencing platforms 


lllumina and PacBio 


MlGS-31.2 


Fold coverage 


759.7X (lllumina), 51. 6x (PacBio) 


MIGS-30 


Assemblers 


Velvet, AllpathsLG 


MIGS-32 


Gene calling method 


Prodigal, GenePRIMP 




Genome Database re- 






lease 


June 3, 2013 (1MB) 




Genbank ID 


Pending release by JGI 




Genbank Date of Re- 






lease 


Pending release by JGI 




GOLD ID 


Gi22434 




Project relevance 


Anaerobic plant degradation 



Genome sequencing and assembly 

Tlie draft genome of C. indolis was generated at 
the DOE Joint genome Institute [JGI) using a hy- 
brid of the lllumina and Pacific Biosciences 
(PacBio) technologies. An lllumina std shotgun li- 
brary and long insert mate pair library was con- 
structed and sequenced using the lllumina HiSeq 
2000 platform [33]. 16,165,490 reads totaling 
2,424.8 Mb were generated from the std shotgun 
and 26,787,478 reads totahng 2,437.7 Mb were 
generated from the long insert mate pair library. A 
Pacbio SMRTbellTM library was constructed and 
sequenced on the PacBio RS platform. 99,448 raw 
PacBio reads yielded 118,743 adapter trimmed 
and quality filtered subreads totaling 330.2 Mb. 
All general aspects of library construction and se- 
quencing performed at the JGI can be found at 
http://www.jgi.doe.gov. All raw lllumina se- 
quence data was passed through DUK, a filtering 
program developed at JGI, which removes known 
lllumina sequencing and library preparation arti- 
facts [34]. Filtered lllumina and PacBio reads were 
assembled using AllpathsLG [PrepareAll- 
pathslnputs: PHRED 64=1 PL0IDY=1 FRAG COV- 
ERAGE=50 JUMP COVERAGE=25; RunAllpath- 
sLG: THREADS=8 RUN=std pairs TAR- 
GETS=standard VAPI WARN ONLY=True OVER- 
WRITE=True) [35]. The final draft assembly con- 
tained 1 contig in 1 scaffold. The total size of the 
genome is 6.4 Mb. The final assembly is based on 
2,424.6 Mb of lllumina Std PE, 2,437.6 Mb of 
lllumina CLIP PE and 330.2 Mb of PacBio post fil- 
tered data, which provides an average 759.7 x 
lllumina coverage and 51. 6x PacBio coverage of 
the genome, respectively. 



Genome annotation 

Genes were identified using Prodigal [36], fol- 
lowed by a round of manual curation using 
GenePRIMP [9] for finished genomes and Draft 
genomes in fewer than 10 scaffolds. The predicted 
CDSs were translated and used to search the Na- 
tional Center for Biotechnology Information 
(NCBI) nonredundant database, UniProt, 
TIGRFam, Pfam, KEGG, COG, and InterPro data- 
bases. The tRNAScanSE tool [37] was used to find 
tRNA genes, whereas ribosomal RNA genes were 
found by searches against models of the ribosomal 
RNA genes built from SILVA [38]. Other non- 
coding RNAs such as the RNA components of the 
protein secretion complex and the RNase P were 
identified by searching the genome for the corre- 
sponding Rfam profiles using INFERNAL [39]. Ad- 
ditional gene prediction analysis and manual func- 
tional annotation was performed within the Inte- 
grated Microbial Genomes [IMG) platform [40] 
developed by the Joint Genome Institute, Walnut 
Creek, CA, USA [41]. Information in the tables be- 
low reflects the gene information in the JGI anno- 
tation on the IMG website [40]. 

Genome properties 

The genome of C. indolis DSM 755 consists of a 
6,383,701 bp circular chromosome with GC con- 
tent of 44.93% [Table 3). Of the 5,903 genes pre- 
dicted, 5,802 were protein-coding genes, and 101 
RNAs; 170 pseudogenes were also identified. 
81.21% of genes were assigned with a putative 
function with the remaining annotated as hypo- 
thetical proteins. The genome summary and dis- 
tribution of genes into COGs functional categories 
are listed in Tables 3 and 4. 
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Table 3. Nucleotide content and gene count levels of the genome of C. indolis DSM 755 




Attribute 




Value % of total" 




Genome size (bp) 




6,383,701 




DNA Coding region (bp) 




5,688,007 89.10 




DNA G+C content (bp) 




2,868,247 44.93 




Total genes'' 




5,903 100.00 




RNA genes 




101 1.71 




Protein-coding genes 




5,802 98.29 




Protein-coding with function pred. 


4,794 81.21 




Genes in paralog clusters 




4,527 76.69 




Genes assigned to COGs 




4,643 78.65 




Genes with signal peptides 








Genes with transmembrane 


helices 


1,494 25.31 




Paralogous groups 




4,527 76.69 




a) The total is based on eith( 


sr the size 


of the genome in base pairs or the total number of 




protein coding genes in the annotated 


genome, b) Also includes 1 70 pseudogenes. 


Table 4. Number of genes in C indolis DSM 755 associated with the 25 general COG functional categories 




Valu6 


o/„app' 


Optiri'intinn 


J 




J.D / 


Translation 




n 




Lll 1 Ig CLI lU 1 1 lUU 1 1 lU-dLIUI 1 


i\ 


DJ I 


1 n i.n 


1 idnscnpiion 


1 

L 


1 Q1 


1. 71 


Kepiicdiioii, recoiiiuiridiiori ano repair 


R 


1 
1 


U.Uz 


Chromatin structure and dynamics 


n 


Z.O 




V^cll CyCIt: CUFIHUI, IIIILUbIb allU UIcIUjIj 


Y 


0 


0 


Nuclear structure 


V 


107 


2.08 


Defense mechanisms 


T 


335 


6.50 


Signal transduction mechanisms 


M 


235 


4.56 


Cell wall/membrane biogenesis 


N 


70 


1.36 


Cell motility 


Z 


0 


0 


Cytoskeleton 


w 


0 


0 


Extracellular structures 


u 


41 


0.80 


Intracellular trafficking and secretion 








Posttranslational modification, protein turnover, chaper- 


o 


124 


2.41 


ones 


c 


261 


5.06 


Energy production and conversion 


G 


910 


17.65 


Carbohydrate transport and metabolism 


E 


493 


9.56 


Amino acid transport and metabolism 


F 


110 


2.13 


Nucleotide transport and metabolism 


H 


153 


2.97 


Coenzyme transport and metabolism 


1 


77 


1.49 


Lipid transport and metabolism 


P 


325 


6.30 


Inorganic ion transport and metabolism 








Secondary metabolites biosynthesis, transport and catab- 


Q 


70 


1.36 


olism 


R 


590 


11.45 


General function prediction only 


S 


319 


6.19 


Function unknown 




1260 


21.35 


Not in COGs 



a) The total is based on the total number of protein coding genes in the annotated genome. 



The genomes of C. indolis and its near relatives (C. saccharolyticum, C. hathewayi, and C. phytofermentans) have 
similar numbers of genes in each of the 25 broad COG categories (not shown), however differences exist in the 
type and distribution of genes in specific functional groups (Table 5), particularly those related to COG categories 
(G) Carbohydrate transport and metabolism, (C) Energy production and conversion, and (Q) Secondary metabolites 
biosynthesis, transport and catabolism. 
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Table 5. Number of genes in each of the 25 general COG functional categories^ found in C. indolis DSM 755^ but 



not in closely related species 


Code 


Value 


Description 


J 


4 


Translation 


A 


0 


RNA processing and modification 


K 


5 


Transcription 


L 


9 


Replication, recombination and repair 


B 


1 


Chromatin structure and dynamics 


D 


0 


Cell cycle control, mitosis and meiosis 


Y 


0 


Nuclear structure 


V 


1 


Defense mechanisms 


T 


2 


Signal transduction mechanisms 


M 


8 


Cell wall/membrane biogenesis 


N 


2 


Cell motility 


Z 


0 


Cytoskeleton 


w 


0 


Extracellular structures 


u 


1 


Intracellular trafficking and secretion 


o 


10 


Posttranslational modification, protein turnover, chaperones 


c 


28 


Energy production and conversion 


G 


6 


Carbohydrate transport and metabolism 


E 


8 


Amino acid transport and metabolism 


F 


1 


Nucleotide transport and metabolism 


H 


11 


Coenzyme transport and metabolism 


1 


2 


Lipid transport and metabolism 


P 


11 


Inorganic ion transport and metabolism 


Q 


10 


Secondary metabolites biosynthesis, transport and catabolism 


R 


18 


General function prediction only 


S 


21 


Function unknow^n 



a) Number of genes from a set of 158 genes not found in near relatives (C. saccharolyticum, C. phytofermentans, 
C. hathewayi) associated with the 25 general COG functional categories. 



Carbohydrate transport and metabolism 

Plant biomass is a complex composite of fibrils 
and sheets of cellulose, hemicellulose, waxes, pec- 
tin, proteins, and lignin. Bacteria from soil and the 
gut generally possess a variety of genes to degrade 
and transport the diversity of substrates encoun- 
tered in these plant-rich environments. The ge- 
nome of C. indolis includes 910 genes [17.65% of 
total protein coding genes) in this COG group in- 
cluding glycoside hydrolases with the potential to 
degrade complex carbohydrates including starch, 
cellulose, and chitin (Table 6), as well as an abun- 
dance of carbohydrate transporters (Figure 2). 
Almost 8% of the protein-coding genes in the ge- 
nome of C. indolis were found to be associated 
with carbohydrate transport, represented by two 
main strategies. ABC (ATP binding cassette) 



transporters tend to carry oligosaccharides, and 
have less affinity for hexoses [43,44], while PTS 
(phosphotransferase system) transporters carry 
many different mono- and disaccharides, especial- 
ly hexoses [45]. PTS systems provide a means of 
regulation via catabolite repression [46], and are 
thought to enable bacteria living in carbohydrate- 
limited environments to more efficiently utilize 
and compete for substrates [46]. Both C. indolis 
and its near relatives are more highly enriched in 
ABC than PTS transporters (Fig 2), however near- 
ly a third of C. indolis and C. saccharolyticum 
transporters are PTS genes, suggesting a prefer- 
ence for hexoses, as well as an adaptation to more 
marginal environments. C. indolis also possesses 
ten genes associated with all three components of 
the TRAP-type C4-dicarboxylate transport system. 
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which transports C4-dicarboxylates such as succinate dehydrogenases suggesting that C. 

formate, succinate, and malate [47], as well as six indolis may have the potential to utilize both of 



putative malate dehydroj 


^enases and two putative these short chain fatty acids. 


Table 6. Selected carbohydrate active genes in the C. indolis DSM 755^ 


genome 


Gene count 


Product name" 


Database ID,, 


19 


Beta-glucosidase (GH-1 ) 


EC:3.2.1.86 






EC:3.2.1.23 




Beta-galactosidase/ 


EC:3.2.1 .25 


8 


beta-glucuronidase (GH-2) 


EC:3.2.1 .31 




Beta-glucosidase/ related 


EC:3.2.1 .21 


7 


glucosidases (GH-3) 


EC;3.2.1 .52 






f^C-T, 9 1 Sfi 
j> .z . 1 .oo 




MIUI Id gdlaCLUzilUazitrD/ 


EC:3.2. 1.122 


14 


6-phospho-beta-glucosidases (GH-4) 


EC:3.2.1.22 


2 


Cellulase, endogluconase (GH-5) 


EC:3.2.1.4 






EC:3.2.1.10 






EC:3.2.1.20 






EC:2.4.1.7 


14 


Alpha-amylase 


EC:3.2.1.70 


8 


Beta-xylosidase (GH 39) 


EC:3.2.1.37 


2 


Chitinase (GH 18) 


EC;3.2.1.14 



a) GH designations given from the CAZy database [42]. b) Enzyme Commission (EC) numbers assigned by the Inte- 
grated Microbial Genome (IMG) database [41]. 




— I 1 1 1 1 1 1 1 

C. indolis C. saccharolyticum C. iiathewayi C. phytofermentans C. indolis C. saccharolyticum C. hathewayi C. piiytofermentans 

Species Species 



Figure 2. Distribution of ABC and PTS transporters in the genomes of C. indolis and related genomes deter- 
mined from Integrated Microbial Genome (IMG) annotation [40] viewed based on (a) Total umber of COGS, 
and (b) Percentage of genes in the genome. 

Energy production and conversion 

The genome of C. indolis contains 261 genes in 
COG category (C) Energy production and conver- 
sion, 28 of which are not found in the near rela- 
tives analyzed, including genes for citrate utiliza- 
tion [Table 7) and nitrogen fixation [Table 8). 



Citrate utilization 

Citrate is a metabolic intermediary found in all liv- 
ing cells. In aerobic bacteria, citrate is utilized as 
part of the tricarboxylic acid [TCA) cycle. In an- 
aerobes, citrate is fermented to acetate, formate, 
and/or succinate. The first step is the conversion 
of citrate to acetate and oxaloacetate in a reaction 
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catalyzed by citrate lyase [EC:4.1.3.6) [48]. C. 
sphenoides, a close relative of C. indolis that does 
not yet have a sequenced genome has been shown 
to utilize citrate [49], but there is conflicting evi- 
dence as to whether this phenotype is present in 
C. indolis [28,30]. The genome of C. indolis reveals 
a group of seven citrate genes organized in a clus- 
ter similar to operons found in other bacterial 



species [48,50] [Figure 3) including CitD, CitE, and 
CitF, the three subunits of the citrate lyase gene 
[48], CitG and CitX which have been shown to be 
necessary for citrate lyase function [50], CitMHS, a 
citrate transporter, and a putative two component 
system similar to citrate regulatory mechanisms 
in other bacteria [51]. 



Table 7. Selection of C indolis DSM 755 genes related to citrate utilization. 



Locus Tag 



Putative Gene Product, 



Gene ID, 



K401DRAFT_2892 
K401DRAFT_2893 
K401DRAFT_2894 
K401DRAFT_2895 
K401DRAFT_2896 
K401DRAFT_2897 
K401DRAFT_2898 
K401DRAFT_2899 
K401 DRAFT 2900 



holo-ACP synthase (CItX) 
citrate lyase acyl carrier (CltD) 
citrate lyase beta subunit (CItE) 
citrate lyase alpha subunit (CItF) 
triphosphoribosyl-dephospho-CoA synthase (CItG) 

citrate (pro3S)-lyase llgase (CItC) 

response regulator, CheY-IIke receiver domain, winged 
helix DNA binding domain 

signal transduction histldlne kinase 

citrate transporter, CITMhHS family 



EC:2. 7.7.61 

EC:4.1.3.6 

EC:4.1.3.6 
EC:2.8.3.10 

EC:4.1.3.6 
EC:2.8.3.10 

EC:2. 7.8.25 
EC:6.2.1.22 



KO:K03303 
TC.LCTP 



Gene products and Enzyme Commission (EC) numbers assigned by the Integrated Microbial Genome (IMG) database 
[41]. 



K401 DRAFT 2892 2893 2894 



2895 



2896 2897 



2898 



2899 



2900 



3154420 



CitD 



CITMHS. 



3164793 



Figure 3. Citrate utilization genes are In a single gene cluster on K401 DRAFT_scaffold0000.1 .1, Including the citrate 
transporter CItMhHS, and a putative two-component system. 



Nitrogen Fixation 

Nitrogen fixation has been observed in other Clos- 
tridia [52,53] but has not been demonstrated in the 
C. saccharolyticum species group. It has been sug- 
gested that the capacity to fix nitrogen confers a se- 
lective advantage to cellulolytic microbes that live 
in nitrogen limited environments such as many soils 
[52]. The functional summary suggests that C. 
indolis can fix nitrogen. The C. indolis genome re- 
veals 22 nitrogenase related genes in four gene 
clusters [Table 8), none of which are found in the 



near relatives analyzed in this study. A minimum set 
of six genes encoding for structural and biosynthet- 
ic components of a functional nitrogenase complex 
have been hypothesized [54]. Genes needed for the 
nitrogenase structural component proteins [nifH, 
nifD, and nifK] are present in C. indolis, but one of 
the three genes required to synthesize the 
nitrogenase iron-molybdenum cofactor [nifN) is not 
identified. Follow up experiments are needed to de- 
termine whether C. indolis can fix nitrogen as pre- 
dicted by the genome analysis 
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Table 8. Selection of C. indolis DSM 755 genes related to nitrogen fixation. 


Locus Tag 


Putative Gene Product 


Gene ID 


K401 DRAFT_0533 


nitrogenase Mo-Fe protein,a and p chains 


pfam00148 


K401 DRAFT_0534 


nitrogenase Mo-Fe protein, a and p chains 


pfam00148 


K401 DRAFT_0535 


nitrogenase subunit (ATPase) (nifH) 


pfam00142 


K401 DRAFT_0884 


nitrogenase Mo-Fe protein, a and P chains 


pfam00148 


K401 DRAFT_0885 


nitrogenase Mo-Fe protein, a and p chains 


pfamOOl 48 


K401 DRAFT_0886 


nitrogenase subunit (ATPase) (nifH) 


pfam00142 


K401 DRAFT_3349 


nitrogenase Mo-Fe protein, a and p chains 


pfamOOl 48 


K401 DRAFT_3350 


nitrogenase Mo-Fe protein, a and P chains 


pfam00148 


K401DRAFT_3351 


nitrogenase subunit (ATPase) (nifH) 


pfamOOl 42 


K401 DRAFT_3874 


nitrogenase Mo-Fe protein a and P chains (nifD) 


pfamOOl 48 


K401 DRAFT_3875 


nitrogenase Mo-Fe protein, a and p chains (nifK) 


pfamOOl 48 


K401 DRAFT_3876 


nitrogenase Fe protein 


pfamOOl 42 


K401 DRAFT_3878 


nitrogenase Mo-Fe protein, a and p chains (nifD) 


pfamOOl 48 


K401 DRAFT_3879 


nitrogenase Mo-Fe protein, a and P chains (nifK) 


pfamOOl 48 


K401 DRAFT_3880 


dinitrogenase Fe-Mo cofactor, (nifH) 


pfam02579 


K401 DRAFT_3895 


nitrogenase Mo-Fe protein, a and p chains (nifD) 


pfamOOl 48 


K401 DRAFT_3896 


nitrogenase Mo-Fe protein, a and p chains (nifK) 


pfamOOl 48 


K401DRAFT_5519 


nitrogenase Mo-Fe protein, a and p chains (nifB) 


pfam04055 


K401 DRAFT_5520 


nitrogenase Mo-Fe protein, a and p chains (nifE) 


pfamOOl 48 


K401DRAFT_5521 


nitrogenase Mo-Fe protein (nifK) 


pfamOOl 48 


K401 DRAFT_5522 


nitrogenase component 1, alpha chain (nifN-like) 


pfamOOl 48 


K401DRAFT_5525 


nitrogenase subunit (ATPase) (nifH) 


pfamOOl 42 



Nitrogenase genes have a common gene identifier (EC:1 .18.6.1), therefore the pfam numbers are given to dis- 
tinguish between subunits. Gene product names and pfam numbers assigned by the Integrated Microbial Ge- 
nome (IMG) database [41]. 
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Lactate utilization 

The genome of C. indolis includes both D- and L- 
lactate dehydrogenases, which convert lactate to 



pyruvate. Additionally, there is a lactate trans- 
porter, suggesting that C. indolis is able to utilize 
exogenous lactate [Table 9]. 



Table 9. Selection of C indolis DSM 755 genes related to lactate utilization. 



Locus Tag 



Putative Gene Product 



Gene ID 



K401 DRAFT 1877 



L-lactate dehydrogenase 



EC:1.1.1.27 



K401 DRAFT 5775 



L-lactate dehydrogenase 



EC:1.1.1.27 



K401 DRAFT 3431 



L-lactate transporter, LctP family 



TC.LCTP 



K401 DRAFT 3220 



D-lactate dehydrogenase 



EC:1.1.1.28 



Annotations assigned by the Integrated Microbial Genome (IMC) database [41] 



Bacterial microcompartments (BMC) 

The C. indolis genome contains genes associated 
with bacterial microcompartment shell proteins. 
Bacterial microcompartments (BMCs) are 
proteinaceous organelles involved in the metabo- 
hsm of ethanolamine, 1,2-propanediol, and possi- 
bly other metabolites [Rev in [55-57]). BMCs are 
often encoded by a single operon or contiguous 
stretch of DNA. The different metabolic types of 
BMCs can be distinguished by a key enzyme (e.g., 
ethanolamine lyase and propanediol dehydratase) 
related to its metabolic function. While the other 
associated genes in the operon can vary, they fre- 
quently include an alcohol dehydrogenase, an al- 
dehyde dehydrogenase, an aldolase and an 
oxidoreductase. 

In C. indolis there are 2 separate genetic loci that 
code for BMCs (Table 10 and 11 and Figure 4). 
One C. indolis locus (Table 10) contains a gene 
[K401DRAFT_2189) with sequence similarity to a 
Bi2-independent propanediol dehydratase found 
in Rosebuha inulinivorans and Clostridium 
phytofermentans [58,59] (both members of the 
Lachnospiraceae). This enzyme has been shown to 
be involved in the metabolism of fucose and 
rhamnose [58,59] and was subsequently catego- 
rized as the glycyl radical prosthetic group-based 



(grp) BMC [60]. The glycyl radical family of en- 
zymes was recently expanded to include a choline 
trimethylamine lyase activity that is part of a 
microcompartment loci in Desulfovibrio 
desulfuhcans [61]. The corresponding C. indolis 
enzymes (K401DRAFT_2189 and 

K401DRAFT_2190) are more similar to the D. 
desulfuricans protein, but there are differences in 
the gene content of the microcompartment loci. 
Further work is needed to determine the physio- 
logical role of this microcompartment. 

The second C. indolis BMC loci (Table 11 and Fig- 
ure 4) is even more enigmatic. This loci contains 
the shell proteins, alcohol dehydrogenase, alde- 
hyde dehydrogenase, aldolase and oxidoreductase 
commonly found in microcompartments, but it 
lacks a known key enzyme. Homologs of this op- 
eron were found in four other bacterial species 
(Figure 4). They are all missing a known key en- 
zjmie and contain 2 genes annotated as CoA- 
transferase. We propose that the C. indolis genome 
and these other bacteria contain a novel type of 
microcompartment, designated the CoAT BMC. It 
is not clear that the function of the 2 annotated 
CoA-transferase genes are as predicted and fur- 
ther research is needed to demonstrate the physi- 
ological role of this BMC. 
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Table 10. grp-BMC ^ 


;enes found in the C. indolis genome. 








Gene ID/ Protein In- 


LUCUs 1 


riUUUCl IXdillc 


lurmaiiuii 


K401DRAFT_2181 


Predicted transcriptional regulator 


COG0789 


K401DRAFT_2182 


Predicted membrane protein 


COG2510 




Carbon dioxide concentrating mecha- 




K4ni DRAFT ?1 8^ 




nfamnnQ^fi 


K401DRAFT_2184 


Predicted membrane protein 


pfam00936 


K401DRAFT_2185 


Hypothetical protein 


- 




Carbon dioxide concentrating mecha- 




K401DRAFT_2186 


nism/carboxysome shell protein 


pfam00936 




Carbon dioxide concentrating mecha- 




K401DRAFT_2187 


nism/carboxysome shell protein 


pfam00936 


K401DRAFT_2188 


NAD-dependent aldehyde dehydrogenase 


pfamOOl 71 


K401DRAFT_2189 


Pyruvate formate lyase 


pfam02901 


K401DRAFT_2190 


Pyruvate formate lyase activating enzyme 


pfam04055 


K'4ni DRAFT 91 Q1 


Ftn a nola m 1 iitili7ation nrotf^in 
LLI Icll lUlcll 1 1 1 1 IC U LI 1 IZ.CILIUI 1 pi ULCI 1 1 


nfamflflQ^fi 


K401DRAFT_2192 


Ethanolamine utilization protein 


pfam 10662 


K401DRAFT_2193 


Alcohol dehydrogenase, class IV 


pfam00465 




Ethanolamine utilization cobalamin 




K401DRAFT_2194 


adenosy transferase 


COG4892 


K401DRAFT_2195 


Ethanolamine utilization protein, possible 






chaperon in 


COG4820 




^1 I" • I J. J." I_ 

Carbon dioxide concentrating mecha- 




K401DRAFT_2196 


nism/carboxysome shell protein 


pfam00936 




Carbon dioxide concentrating mecha- 




K401DRAFT_2197 


nism/carboxysome shell protein 


pfam03319 


K401DRAFT_2198 


Ethanolamine utilization protein 


pfam06249 




Carbon dioxide concentrating mecha- 




1^401 DRAFT 91 QQ 


1 1 Idl 1 1/ ^Cll L/LfAy 9UI 1 icr 3lldl LIIULC^III 


nfamnnQ^Pi 


K401 DRAFT_2200 


NAD-dependent aldehyde dehydrogenase 


pfamOOl 71 


K401 DRAFT_2201 


Propanediol utilization protein 


pfam061 30 




Carbon dioxide concentrating mecha- 




K401 DRAFT_2202 


nism/carboxysome shell protein 


pfam00936 



Annotations assigned by the Integrated Microbial Genome (IMG) database [41]. 



The second C. indolis BMC loci [Table 11 and Fig- 
ure 4) is even more enigmatic. This loci contains 
the shell proteins, alcohol dehydrogenase, alde- 
hyde dehydrogenase, aldolase and oxidoreductase 
commonly found in microcompartments, but it 
lacks a known key enzjmie. Homologs of this op- 
eron were found in four other bacterial species 
(Figure 4). They are all missing a known key en- 



zyme and contain 2 genes annotated as CoA- 
transferase. We propose that the C. indolis genome 
and these other bacteria contain a novel type of 
microcompartment, designated the CoAT BMC. It 
is not clear that the function of the 2 annotated 
CoA-transferase genes are as predicted and fur- 
ther research is needed to demonstrate the physi- 
ological role of this BMC. 
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5356738 



Cath_TA2_ 
2122 2121 



Clostridium indolis DSM 755: K401 DRAFT scaffoldOOOOl .1 



5347578 



2122 2121 2120 2119 2116 2115 2114 2113 ; 

^eoRC fuc^ Alded^eut^ eutW} eut^ Pdu^ CcmJ^ 



2110 




40132 Caldalkalibacillus thermarum TA2.A1 ctg233: NZ_AFCE01000150 32589 

0675 0676 0677 0678 0679 



CLOST_ 

0672 0673 0674 



i 



DeoRC 



fu c^ Alded^ eut^euW^eut^e^ ] 



0680 0681 0682 

eutNM SBF^ cod' 
Cc mJ/ Joacy 0446^ 



0684 




804175 



Clostridium sticklandii DSM 519 chromosome: NC 014614 



813840 



Closa_ 

1817 1818 1819 
i 



1820 1821 1822 1823 1824 



1830 




1957805 



Clostridium saccharolyticum WM1 chromosome: NC_014376 



1968797 



Bsel_ 

3119 3118 



3117 3116 3115 3114 3113 3112 3111 



3110 



3109 



3106 



^eoRcj 



fuc^ Aldedt 





eutMjPdu 



eutN\ SB 
Cc ml/ bad 




Bacillus se enitrireducens MLS10 chromosome: NC 014219 



3358185 



3366470 

Figure 4. CoAT BMC operon found in C. indolis, Caldalkalibacillus thermarum, C. stricklandii, C. 
saccharolyticum, and Bacillus selenitrireducens. Gene details are found in Table 1 1 . 



Table 11. CoAT BMC genes found in the C. indolis genome. 



Locus Tag 




Product Name 


Gene ID/ Protein 
Information 


K401 DRAFT 


_4970 


DeoRC transcriptional regulator 


pfam00455 


K401 DRAFT 


_4969 


fucA, L-fuculose-phosphate aldolase 


EC:4.1.2.17 


K401 DRAFT 


_4968 


pduP, propionaldehyde dehydrogenase 


pfam00171 


K401 DRAFT 


_4967 


eutM, ethanolamine utilization protein 


pfam00936 






Carbon dioxide concentrating mecha- 




K401 DRAFT 


_4966 


nism/carboxysome shell protein 


pfam00936 






Carbon dioxide concentrating mecha- 




K401 DRAFT 


_4965 


nism/carboxysome shell protein 


pfam00936 






Carbon dioxide concentrating mecha- 




K401 DRAFT 


_4964 


nism/carboxysome shell protein 


pfam00936 


K401 DRAFT 


_4963 


Pdul, propanediol utilization protein 


pfam06130 


K401 DRAFT 


_4962 


eutN_CcmL 


pfam03319 


K401 DRAFT 


_4961 


SBP_bac_8, ABC-type sugar transporter 


pfaml3416 






Uncharacterized NAD(FAD)-dependent dehydrogen- 




K401 DRAFT 


_4960 


ase 


COG0446 


K401 DRAFT 


_4959 


CoA-transferase 


pfam01144 


K401 DRAFT 


_4958 


CoA-transferase 


pfam0n44 


K401 DRAFT 


_4957 


Fe-ADH, Alcohol dehydrogenase 


pfam00465 



Annotations assigned by the Integrated Microbial Genome (IMG) database [41] 
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Secondary metabolites biosynthesis, environments [62]. The genome of C. indolis con- 
transport and catabolism tains two protocatechuate dioxygenases and an 
Protocatechuate and other aromatics are inter- aromatic hydrolase, reveaUng the potential for uti- 
mediaries in the degradation of lignin in plant rich lining aromatic compounds (Table 12). 



Table 12. Selection of C. indolis DSM 755^ genes related to degradation of aromatics. 



Locus Tag 




Putative Gene Product 




Gene ID 


K401 DRAFT, 


3571 


Protocatechuate 3,4-dioxyj 


jenase beta subunit 


EC:1.13.n.3 


K401 DRAFT, 


3568 


Protocatechuate 3,4-dioxyj 


jenase beta subunit 


EC:1.13.n.3 










ECS. 3. 3. 3 


K401 DRAFT, 


3412 


Aromatic ring hydroxylase 




EC:4.2. 1.120 



Annotations assigned by the Integrated Microbial Genome (IMG) database [41] 



Conclusion 

The genomic sequence of C. indolis reported here 
reveals the metabolic potential of this organism to 
utilize a wide assortment of fermentable carbohy- 
drates and intermediates including citrate, lactate, 
malate, succinate, and aromatics, and points to po- 
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